Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production

نویسندگان

  • A. Vaniachine
  • D. Malon
  • P. Nevski
  • Kaushik De
چکیده

A worldwide computing model, embracing a global data and computation infrastructure, is emerging to answer the LHC computing challenges. A significant fraction of the ATLAS Data Challenge 1 (DC1) was performed in a Grid environment. For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component in the data management architecture for distributed production in ATLAS DC1. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the Virtual Data Cookbook (VDC) catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must be provided before the data transformation invocation. Because Virtual Data technologies were in the prototyping stage at the start of DC1, the data volume allocated for production tests of the virtual data system was limited to about one fifth of all the DC1 data. To provide for local-remote transparency during DC1 production, the VDC database server delivered in a controlled way both the validated production parameters and the templated production recipes for thousands of the event generation and detector simulation jobs around the world, simplifying the production management solutions. The major benefit of Virtual Data technologies was demonstrated by simplifying the management of the parameter compositions that were different for each of the more than two hundred datasets produced in DC1. Significant reduction in the parameter management overhead enabled successful processing of about half of all the DC1 datasets (representing 20% of the data) using the VDC services. Another benefit of Virtual Data Cookbook technologies is the simplification of the data reprocessing step. We have found it useful to distinguish (both conceptually and in the production system design) the data required before the invocation of the transformation from the information collected after the data transformation completion – data provenance. We further envision that templated recipe catalogues (experiments’ “cookbooks”) encapsulating production gurus’ knowledge in the ‘provender’ data that are necessary before the data transformation can be invoked will be integrated in a coherent system utilizing the Chimera technology from the GriPhyN project. Chimera system eliminates the ‘manual’ tracking of the data dependencies between separate production steps and enables multi-step compound data transformations on-demand.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrated Model-Based Manufacturing for Rapid Product and Process Development

The paper presents integrative model-based approach in application of virtual engineering technologies in rapid product and process design and manufacturing. This has resulted in integration of so called CAtechnologies and Virtual Reality in product design and FE numerical simulations and optimization of production processes, as digital prototyping of product and processes, from one side, and r...

متن کامل

Atlas Data Challenge Production on Grid3

We describe the design and operational experience of the ATLAS production system as implemented for execution on Grid3 resources. The execution environment consisted of a number of grid-based tools: Pacman for installation of VDT-based Grid3 services and ATLAS software releases, the Capone execution service built from the Chimera/Pegasus virtual data system for directed acyclic graph (DAG) gene...

متن کامل

Virtual Environments for Prototyping Tier-3 Clusters

The deployed hierarchy of Tier-1 and Tier-2 data centers, as organized within the context of the Worldwide LHC Computing Grid (WLCG), have without question been exceedingly successful in meeting the large-scale, group-level production and grid-level data analysis requirements of the experiments in the first full year of LHC operations. However, the plethora of derived datasets and formats thus ...

متن کامل

Integration of extracellular RNA profiling data using metadata, biomedical ontologies and Linked Data technologies

The large diversity and volume of extracellular RNA (exRNA) data that will form the basis of the exRNA Atlas generated by the Extracellular RNA Communication Consortium pose a substantial data integration challenge. We here present the strategy that is being implemented by the exRNA Data Management and Resource Repository, which employs metadata, biomedical ontologies and Linked Data technologi...

متن کامل

The Data-flow System of the ATLAS DAQ and Event Filter Prototype "-1" Project

A prototyping project has been undertaken by the ATLAS DAQ and Event Filter group to design and implement a fully functional vertical slice of the ATLAS DAQ and Event Filter. It supports the evaluation of hardware and software technologies as well as their system integration aspects. This paper describes the Data-flow component, its design, implementation and performanc

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.DC/0306102  شماره 

صفحات  -

تاریخ انتشار 2003